ReBNN: Resilient Binary Neural Network
85
E:HLJKWGLVWULEXWLRQRI5H$FW1HW
F,OOXVWUDWLRQRIZHLJKWRVFLOODWLRQ
D:HLJKWRVFLOODWLRQRI5H$FW1HW
FIGURE 3.28
(a) We show the epoch-wise weight oscillation of ReActNet. (b) We randomly select two
channels of the first 1-bit layer in ReActNet [158]. The distribution is with three peaks
centering around {−1, 0, +1}, which magnifies the non-parametric scaling factor (red line).
(c) We illustrate the weight oscillation caused by such inappropriate scale calculation, where
w and L indicate the latent weight and network loss function (blue line), respectively.
a result, we apply this set of hyperparameters to the remaining experiments in this chapter.
Note that the recurrent model does not affect when τ is set to 1.
3.9
ReBNN: Resilient Binary Neural Network
Conventional BNNs [199, 158] are often sub-optimized due to their intrinsic frequent weight
oscillation during training. We first identify that the weight oscillation mainly originates
from the non-parametric scaling factor. Figure 3.28(a) shows the epoch-wise oscillation4
of ReActNet, where the weight oscillation exists even when the network is convergent.
As shown in Fig. 3.28(b), the conventional ReActNet [158] possesses a channel-wise tri-
modal distribution in the 1-bit convolution layers, whose peaks, respectively, center around
{−1, 0, +1}. This distribution leads to a magnified scaling factor α, and thus the quantized
weights ±α are much larger than the small weights around 0, which might cause the weight
oscillation. As illustrated in Fig. 3.28(c), In BNNs, the real-valued latent tensor is binarized
by the sign function and scaled by the scaling factor (the orange dot) in forward propagation.
In backward propagation, the gradient is calculated based on the quantized value ±α (indi-
cated by the yellow dotted line). However, the gradient of small latent weights is misleading
when weights around ±1 magnify the scaling factor, such as ReActNet (Fig. 3.28(a)). Then
the update is conducted on the latent value (the black dot), leading to the latent weight
oscillation. With minimal representation states, such latent weights with small magnitudes
frequently oscillate during non-convex optimization.
We aim to introduce a Resilient Binary Neural Network (ReBNN) [258] to address the
problem above. The intuition of our work is to relearn the channel-wise scaling factor and the
latent weights in a unified framework. Consequently, we propose parameterizing the scaling
factor and introducing a weighted reconstruction loss to build an adaptive training objective.
4A toy example of weight oscillation: From iteration t to t+1, a misleading weight update occurs causing
an oscillation from −1 to 1, and from iteration t to t+2 causes an oscillation from 1 to −1.